-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-39582: Add caching for some butler primitives during deserialization #858
Conversation
Often may butler primitives are deserialized at the same time, and it is useful for these objects to share references to each other. This reduces load time and memory usage.
Downstream code now depends on refs holding UUIDs. Have the yaml loader convert old style integer ids to UUIDs early rather than waiting for downstream cleanups.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #858 +/- ##
==========================================
- Coverage 88.01% 87.89% -0.13%
==========================================
Files 269 270 +1
Lines 35420 35544 +124
Branches 7424 7452 +28
==========================================
+ Hits 31176 31241 +65
- Misses 3103 3145 +42
- Partials 1141 1158 +17
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK, one concern is about cache key for DatasetRef maybe needing component
. I did not comment on docstring style, I think ruff
should not take care of that.
doc/changes/DM-39582.api.md
Outdated
@@ -0,0 +1 @@ | |||
Deprecate reconstituteDimensions argument from Quantum.from_simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think file name should be removal
, not api
, according to README.
Add backticks around reconstituteDimensions
and Quantum.from_simple
and period at the end.
# Minimalist component will just specify component and id and | ||
# require registry to reconstruct | ||
if set(simple.dict(exclude_unset=True, exclude_defaults=True)).issubset({"id", "component"}): | ||
if not (simple.datasetType is not None or simple.dataId is not None or simple.run is not None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rewrite this as simple.datasetType is None and simple.dataId is None and simple.run is None
, I think it makes it easier to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not logically the same thing, We only want to run this when they are all False
. But and is greedy, so False will always gobble up anything.
key = frozenset(dataset_ids) | ||
cache = PersistenceContextVars.serializedDatastoreRecordMapping.get() | ||
if cache is not None and (value := cache.get(key)) is not None: | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not expect many (or maybe any) cache hits for this. DatastoreRecordData
is per-quantum structure, I do not think any two quanta can have the same set of input datasets?
@@ -64,6 +64,8 @@ | |||
this version of the code. | |||
""" | |||
|
|||
_refIntId2UUID = defaultdict[int, uuid.UUID](uuid.uuid4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a workaround for input YAML files that still have integer dataset IDs in them? Should we instead fix those YAML files? It may also be better to generate reproducible UUIDs in that case if you want to keep this map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not be sure all the places where this might be happening, so I opted to fix it here. Good point on the reproducibility I will go with that.
MyPy seems to narrow types somehow when comparing Enum Flags directly with equality operators. Compare by value instead.
1178cdb
to
11e458f
Compare
11e458f
to
ef41fb5
Compare
Checklist
doc/changes